Construct the QA bot

It's finally time to construct the QA bot!

Let's start off by creating a new Python file that will store your bot. Click on the button below to create a new Python file, and call it qabot.py. If, for whatever reason, the button does not work, make the new file by going to File --> New Text File. Be sure to save the file as qabot.py.

In the following sections, you will populate qabot.py with your bot.

Import necessary libraries

Inside qabot.py, import the following from gradio, ibm_watsonx.ai, langchain_ibm, langchain, and langchain_community. The imported classes are necessary for initializing models with the correct credentials, splitting text, initializing a vector store, loading PDFs, generating a question-answer retriever, and using Gradio.

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  1. from ibm_watsonx_ai.foundation_models import ModelInference
  2. from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
  3. from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames
  4. from ibm_watsonx_ai import Credentials
  5. from langchain_ibm import WatsonxLLM, WatsonxEmbeddings
  6. from langchain.text_splitter import RecursiveCharacterTextSplitter
  7. from langchain_community.vectorstores import Chroma
  8. from langchain_community.document_loaders import PyPDFLoader
  9. from langchain.chains import RetrievalQA
  10. import gradio as gr
  11. # You can use this section to suppress warnings generated by your code:
  12. def warn(*args, **kwargs):
  13. pass
  14. import warnings
  15. warnings.warn = warn
  16. warnings.filterwarnings('ignore')

Initialize the LLM

You will now initialize the LLM by creating an instance of WatsonxLLM, a class in langchain_ibm. WatsonxLLM can use several underlying foundational models. In this particular example, you will use Mixtral 8x7B, although you could have used other models, such as Llama 3.1 405B. For a list of foundational models available at watsonx.ai, refer to this.

To initialize the LLM, paste the following into qabot.py. Note that you are initializing the model with a temperature of 0.5, and allowing for the generation of a maximum of 256 tokens:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  1. ## LLM
  2. def get_llm():
  3. model_id = 'mistralai/mixtral-8x7b-instruct-v01'
  4. parameters = {
  5. GenParams.MAX_NEW_TOKENS: 256,
  6. GenParams.TEMPERATURE: 0.5,
  7. }
  8. project_id = "skills-network"
  9. watsonx_llm = WatsonxLLM(
  10. model_id=model_id,
  11. url="https://us-south.ml.cloud.ibm.com",
  12. project_id=project_id,
  13. params=parameters,
  14. )
  15. return watsonx_llm

Define the PDF document loader

Next, you will define the PDF document loader. You will use the PyPDFLoader class from the langchain_community library to load PDF documents. The syntax is quite straightforward. First, you create the PDF loader as an instance of PyPDFLoader. Then, you load the document and return the loaded document. To incorporate the PDF loader in your bot, add the following to qabot.py:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  1. ## Document loader
  2. def document_loader(file):
  3. loader = PyPDFLoader(file.name)
  4. loaded_document = loader.load()
  5. return loaded_document

Define the text splitter

The PDF document loader loads the document but does not split it into chunks when using the .load() method. Consequently, you must define a document splitter that will split the text into chunks. Add the following code to qabot.py to define such a text splitter. Note that, in this example, you are defining a RecursiveCharacterTextSplitter with a chunk size of 1000, although other splitters or parameter values are possible:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  1. ## Text splitter
  2. def text_splitter(data):
  3. text_splitter = RecursiveCharacterTextSplitter(
  4. chunk_size=1000,
  5. chunk_overlap=50,
  6. length_function=len,
  7. )
  8. chunks = text_splitter.split_documents(data)
  9. return chunks

Define the vector store

Now that you have a way to load the PDF into text and split that text into chunks, you must define a way to embed and store those chunks in a vector database. Add the following code to qabot.py to define a function that embeds the chunks using a yet to be defined embedding model and stores the embeddings in a ChromaDB vector store:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  1. ## Vector db
  2. def vector_database(chunks):
  3. embedding_model = watsonx_embedding()
  4. vectordb = Chroma.from_documents(chunks, embedding_model)
  5. return vectordb

Define the embedding model

The above vector_database() function assumes the existence of a watsonx_embedding() function that loads an instance of an embedding model. This embedding model is needed to convert chunks of text into vector representations. The following code defines a watsonx_embedding() function that returns an instance of WatsonxEmbeddings, a class from langchain_ibm that generates embeddings. In this particular case, the embeddings are generated using IBM's Slate 125M English embeddings model. Paste this code into the qabot.py file:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  1. ## Embedding model
  2. def watsonx_embedding():
  3. embed_params = {
  4. EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: 3,
  5. EmbedTextParamsMetaNames.RETURN_OPTIONS: {"input_text": True},
  6. }
  7. watsonx_embedding = WatsonxEmbeddings(
  8. model_id="ibm/slate-125m-english-rtrvr",
  9. url="https://us-south.ml.cloud.ibm.com",
  10. project_id="skills-network",
  11. params=embed_params,
  12. )
  13. return watsonx_embedding

Note that it does not matter to Python that watsonx_embedding() is defined after vector_database(). The order of these definitions could have been reversed, which would have resulted in no change in the underlying functionality of the bot.

Define the retriever

Now that your vector store is defined, you must define a retriever that retrieves chunks of the document from it. In this particular case, you will define a vector store-based retriever that retrieves information using a simple similarity search. To do so, add the following lines to qabot.py:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  1. ## Retriever
  2. def retriever(file):
  3. splits = document_loader(file)
  4. chunks = text_splitter(splits)
  5. vectordb = vector_database(chunks)
  6. retriever = vectordb.as_retriever()
  7. return retriever

Define a question-answering chain

Finally, it is time to define a question-answering chain! In this particular example, you will use RetrievalQA from langchain, a chain that performs natural-language question-answering over a data source using retrieval-augmented generation (RAG). Add the following code to qabot.py to define a question-answering chain:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  1. ## QA Chain
  2. def retriever_qa(file, query):
  3. llm = get_llm()
  4. retriever_obj = retriever(file)
  5. qa = RetrievalQA.from_chain_type(llm=llm,
  6. chain_type="stuff",
  7. retriever=retriever_obj,
  8. return_source_documents=False)
  9. response = qa.invoke(query)
  10. return response['result']

Let's recap how all the elements in our bot are linked. Note that RetrievalQA accepts an LLM (get_llm()) and a retriever object (an instance generated by retriever()) as arguments. However, the retriever is based on the vector store (vector_database()), which in turn needed an embeddings model (watsonx_embedding()) and chunks generated using a text splitter (text_splitter()). The text splitter, in turn, needed raw text, and this text was loaded from a PDF using PyPDFLoader. This effectively defines the core functionality of your QA bot!

Set up the Gradio interface

Given that you have created the core functionality of the bot, the final item to define is the Gradio interface. Your Gradio interface should include:

  • A file upload functionality (provided by the File class in Gradio)
  • An input textbox where the question can be asked (provided by the Textbox class in Gradio)
  • An output textbox where the question can be answered (provided by the Textbox class in Gradio)

Add the following code to qabot.py to add the Gradio interface:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  1. # Create Gradio interface
  2. rag_application = gr.Interface(
  3. fn=retriever_qa,
  4. allow_flagging="never",
  5. inputs=[
  6. gr.File(label="Upload PDF File", file_count="single", file_types=['.pdf'], type="filepath"), # Drag and drop file upload
  7. gr.Textbox(label="Input Query", lines=2, placeholder="Type your question here...")
  8. ],
  9. outputs=gr.Textbox(label="Output"),
  10. title="RAG Chatbot",
  11. description="Upload a PDF document and ask any question. The chatbot will try to answer using the provided document."
  12. )

Add code to launch the application

Finally, you need to add one more line to qabot.py to launch your application using port 7860:

  1. 1
  2. 2
  1. # Launch the app
  2. rag_application.launch(server_name="0.0.0.0", server_port= 7860)

After adding the above line, save qabot.py.

Verify qabot.py

Your qabot.py should now look like the following:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
  86. 86
  87. 87
  88. 88
  89. 89
  90. 90
  91. 91
  92. 92
  93. 93
  94. 94
  95. 95
  96. 96
  97. 97
  98. 98
  99. 99
  100. 100
  101. 101
  102. 102
  103. 103
  104. 104
  105. 105
  106. 106
  1. from ibm_watsonx_ai.foundation_models import ModelInference
  2. from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
  3. from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames
  4. from ibm_watsonx_ai import Credentials
  5. from langchain_ibm import WatsonxLLM, WatsonxEmbeddings
  6. from langchain.text_splitter import RecursiveCharacterTextSplitter
  7. from langchain_community.vectorstores import Chroma
  8. from langchain_community.document_loaders import PyPDFLoader
  9. from langchain.chains import RetrievalQA
  10. import gradio as gr
  11. # You can use this section to suppress warnings generated by your code:
  12. def warn(*args, **kwargs):
  13. pass
  14. import warnings
  15. warnings.warn = warn
  16. warnings.filterwarnings('ignore')
  17. ## LLM
  18. def get_llm():
  19. model_id = 'mistralai/mixtral-8x7b-instruct-v01'
  20. parameters = {
  21. GenParams.MAX_NEW_TOKENS: 256,
  22. GenParams.TEMPERATURE: 0.5,
  23. }
  24. project_id = "skills-network"
  25. watsonx_llm = WatsonxLLM(
  26. model_id=model_id,
  27. url="https://us-south.ml.cloud.ibm.com",
  28. project_id=project_id,
  29. params=parameters,
  30. )
  31. return watsonx_llm
  32. ## Document loader
  33. def document_loader(file):
  34. loader = PyPDFLoader(file.name)
  35. loaded_document = loader.load()
  36. return loaded_document
  37. ## Text splitter
  38. def text_splitter(data):
  39. text_splitter = RecursiveCharacterTextSplitter(
  40. chunk_size=1000,
  41. chunk_overlap=50,
  42. length_function=len,
  43. )
  44. chunks = text_splitter.split_documents(data)
  45. return chunks
  46. ## Vector db
  47. def vector_database(chunks):
  48. embedding_model = watsonx_embedding()
  49. vectordb = Chroma.from_documents(chunks, embedding_model)
  50. return vectordb
  51. ## Embedding model
  52. def watsonx_embedding():
  53. embed_params = {
  54. EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: 3,
  55. EmbedTextParamsMetaNames.RETURN_OPTIONS: {"input_text": True},
  56. }
  57. watsonx_embedding = WatsonxEmbeddings(
  58. model_id="ibm/slate-125m-english-rtrvr",
  59. url="https://us-south.ml.cloud.ibm.com",
  60. project_id="skills-network",
  61. params=embed_params,
  62. )
  63. return watsonx_embedding
  64. ## Retriever
  65. def retriever(file):
  66. splits = document_loader(file)
  67. chunks = text_splitter(splits)
  68. vectordb = vector_database(chunks)
  69. retriever = vectordb.as_retriever()
  70. return retriever
  71. ## QA Chain
  72. def retriever_qa(file, query):
  73. llm = get_llm()
  74. retriever_obj = retriever(file)
  75. qa = RetrievalQA.from_chain_type(llm=llm,
  76. chain_type="stuff",
  77. retriever=retriever_obj,
  78. return_source_documents=False)
  79. response = qa.invoke(query)
  80. return response['result']
  81. # Create Gradio interface
  82. rag_application = gr.Interface(
  83. fn=retriever_qa,
  84. allow_flagging="never",
  85. inputs=[
  86. gr.File(label="Upload PDF File", file_count="single", file_types=['.pdf'], type="filepath"), # Drag and drop file upload
  87. gr.Textbox(label="Input Query", lines=2, placeholder="Type your question here...")
  88. ],
  89. outputs=gr.Textbox(label="Output"),
  90. title="RAG Chatbot",
  91. description="Upload a PDF document and ask any question. The chatbot will try to answer using the provided document."
  92. )
  93. # Launch the app
  94. rag_application.launch(server_name="0.0.0.0", server_port= 7860)